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Abstract — This paper illustrates a hybrid prediction 
system consists of Rough Set Theory (RST) and Artificial 
Neural Network (ANN) for processing medical data. In the 
process of developing a new data mining technique and 
software to aid efficient solutions for medical data 
analysis, we propose a hybrid tool that incorporates RST 
and ANN to make efficient data analysis and suggestive 
predictions. In the experiments, we used spermatological 
data set for predicting quality of animal semen. The data 
set used in the experiments is subjected to quantize and 
normalize, and use this as a reflection of the internal 
system state. The RST is used as a tool for reducing and 
choosing the most relevant sets of internal states for 
predicting the semen fertilization potential. Chosen 
optimal data set is input to constructed neural network 
with supervised learning algorithm for the prediction of 
semen quality. This paper demonstrates that the RST is 
an effective pre-processing tool for reducing the number 
of input vector to ANN without reducing the basic 
knowledge of the information system in order to increase 
prediction accuracy of the proposed system. The 
resulting system is a hybrid prediction system for medical 
database called an Intelligent Rough Neural Network 
System (IRNNS). 



Keywords: Artificial Neural Network, Machine learning 
technique, In-vitro fertilization, Rough sets theory (RST), 
Fertility rate prediction, IRNNS, Hybrid prediction 
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VII. Introduction 

Several machine learning techniques or data mining 
tools like Artificial Neural Networks (ANN), Fuzzy logic 
and Rough Sets Theory (RST) are used for data 
classification. There have been number of research 
works and surging interests in ANN and developing 
hybrid system by combining other applications with 
ANN. The neural network and rough sets 
methodologies have their place among intelligent 
classification and decision support systems. Knowledge 
of the system can be seen as organized data sets with 
the ability to perform classification. Hence a formal 
framework capable of reasoning about classifications 
and delivering implicit facts from explicit knowledge 



would be helpful. The ANN and RST can be combined 
to obtain such a framework. This approach is based on 
the rough sets feature selection mechanism and neural 
networks efficient classification property. Traditional 
model construction and simulation data mining 
techniques perform poorly due to the highly non linear 
dynamics and overwhelming complexity of data being 
generated. 

The knowledge acquired by ANN through training 
process is represented by the weights of the 
connections between the neurons, the threshold values 
and the activation function. Identifying the problem 
description at the neural level is not possible because of 
the implicit knowledge representation of the neuron; 
therefore, neural network often called as 'black boxes'. 
To improve the quality of the learning, the Rough Sets 
Theory (RST) is used to select key parameters before 
training the predictor (ANN). 

Rough Sets Theory, developed by Z. Pawlak and his 
co-workers in the early 1980s [1], has become a widely 
recognized data analysis method to deal with vagueness 
and uncertainty of data [2]. The concept of RST is 
founded on the assumption that every object of the 
universe of discourse is associated with some 
information [3]. The RST finds the description of sets of 
objects in terms of attribute values, checks dependency 
between attributes, finds significance of attributes, 
reduces attributes and derives decision rules [4]. The 
rough sets based reduction of the attributes space not 
only improves the efficiency of the predictor itself, but 
also provides some additional information about the 
mechanisms governing decision-making. One of the 
reasons for developing hybrid system is to build more 
powerful systems that can reduce drawbacks of 
implementing a single machine learning techniques. 
Some of other researchers proposed similar integrated 
method in other applications for classification and 
prediction purpose [5]-[8]. 

In this paper, a quick reduct algorithm based on 
attribute frequency in discernibility matrix is proposed for 
pre-processing. We also propose an intelligent rough 
neural network algorithm for efficient data classification 
and prediction. The medical data used in this work are 
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in the format of multi-attribute information table and suit 
the rough set model. The paper is organized as follows. 
In Section II, the rough set and neural network approach 
is briefly reviewed. The hybrid strategy of proposed 
model in the data mining setting is presented. The RST 
based data analysis is reviewed in Section ll-A, and 
ANNs are discussed in Section ll-B. Then the overall 
structure of the hybrid system is presented in Section II- 
C. In Section III, illustrative experimental results are 
presented. Then this paper is concluded with brief 
discussion of the study and future research directions. 

vill. Rough Sets Theory 
The method of rough set data analysis has the 
following advantages over traditional methods [9], [10]. 
Rough set method is unlike probability in statistics or 
membership grade in the fuzzy set theory, based on the 
original data sets not any external information [11]. It is 
suitable for both quantitative, qualitative attributes and 
discovers hidden facts in data in the form of decision 
rules. The derived decision rules describe the 
knowledge contained in the information tables and 
eliminate the redundancy of original data. The results 
obtained by rough set method are simple and 
explainable. Finding minimal subsets (reducts) of 
attributes that are efficient for rule making is a central 
part of its process [12]. RST is a combinatorial tool for 
reducing quantized data sets by discarding attributes 
that have no or limited discriminatory power [2], [4] and 
[13]. 

A. Basic notions 

The basic notions of RST are: information system, 
approximations, reduction of attributes and others. 

1) Information system: An information system is 
defined as l=(U,A), where U is a non-empty set of finite 
objects called universe, the finite attribute set A={a h .. 
a n ), where each attribute a e A is a total function a : U -> 
V a , where V a is called the domain or value set of 
attribute a,. 

An approximation space is an ordered pair A = (U, R), 
where U is a finite and non-empty set of elements called 
attributes, R is an equivalence relation about U. Any set 
BcA there is an associated equivalence relation called 
B-indiscerbility relation defined as: 

IND A (B) = {(x,y)eU 2 | Va e B,a(x) = a(y)} (1) 



If (x, y) e IND A (B), then x and y are indiscernible from 
each other by attributes from B. The indiscernibility is 
an equivalence relation. 

2) Approximations: In this way, RST provide a simple 
form to treat with the uncertainty. Given information 
system I, let X c U be a set of objects and B c A is a 
selected set of attributes. The lower and upper 
approximations of X with respect to B are defined as: 

B, (X) = u{Y g U | IND(P) :YcX) (2) 

B*(X) = u{Ye U|IND(P): Y n X * <p} (3) 

The 6-lower approximation B*(X), is the complete set 
of objects in U which can be certainly classified as 
elements in X using the set of attributes B and the 
B-upper approximation B (X), is the set of elements in U 
that can be possibly classified as elements in X. The 
B-boundary of X in the information system I, is defined 
as: BND(X)= B (X) - B-(X). The rough set approximations 
are illustrated in Fig. 1. 
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Fig. 1. Rough set approximations 

3) Reduction of Attributes: Reduct is a minimum 
attributes subset that retains the decision attributes 
dependence degree to conditional attributes. The 
subset RcBcA such that Y B (Y)=Y R (Y) is called Y-reduct 
of B and denoted as Red Y (B). The core is possessed by 
every legitimate reduct and cannot be removed from the 
information system without deteriorating basic 
knowledge of the system. The set of all indispensable 
attributes of B is called Y-core. Formally, 

Core Y (B) = nRed Y (B) (4) 
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The Y-core is intersection of 
included in every Y-reducts of 6. 



all Y-reducts of B, output for exemplar i at processing element j and dy is 



3) Accuracy: Accuracy measures how much a set is 
rough. If a set has B-(X) = B*(X) = X, the set is precise 
called crisp and for every element x e X e U. This is 
expressed by the formula: 

|B * (X)I 

a B W =— ( 5 ) 

IB (X)| 

When 0 < a B (X) < 1 , and if a B (X) 
respect to B. 



1 X is crisp with 



IX. Artificial Neural Network 

ANN is an interconnected group of artificial neurons 
that uses a mathematical model or computational model 
for information processing based on a connectionist 
approach to computation. In most cases an ANN is an 
adaptive system that changes its structure based on 
external or internal information that flows through the 
network. In more practical terms neural networks are 
non-linear statistical data modeling tools. They can be 
used to model complex relationships between inputs 
and outputs or to find patterns in data. As computers 
become faster, the ANN methodology is replacing many 
traditional tools in the field of knowledge discovery and 
some related fields. ANN is composed of a large 
number of highly interconnected processing elements 
(neurons) working in unison to solve specific problems. 
The learning in biological systems involves adjustments 
to the synaptic connections that exist between the 
neurons. 

The main neural networks types based on their 
structures are Single layer perceptron, Multi-layer 
perceptron, Backpropagation net, Hopfield net and 
Kohonen feature map. Multi-layer perceptron (MLP) is 
recognized as the best ANN used in classification from 
examples [14]. In this work, the multi-layer perceptron 
with back-propagation supervised learning algorithm is 
used for experimentation. Due to its extended structure, 
MLP is able to solve every logical operation, including 
XOR problem. The back-propagation algorithm in MLP 
is the solution of choice for many machine learning tasks 
[15], [16]. An advantage of supervised learning is the 
minimization of error between the desired and computed 

unit values. The predictive performance of ANN is 
measured by computing the mean squared error (MSE), 
defined as: 



MSE 



1 

NP 
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(6) 



target output for exemplar i at processing element j. 



IV. INTELLIGENT ROUGH NEURAL NETWORK SYSTEM 
(IRNNS) 

1) The framework of hybridizing RST and ANN- 
based Set all weights of units to random 
values ranging from -1 .0 to +1 .0. 

2) Set an input pattern to the neurons of the 
net's input layer. 

3) Activate each neuron of the following layer, 
learning system is shown in Fig. 2 and 3. These two 

techniques can be used for both classification and 
regression tasks without any converting mechanism. 
Incorporating these two technologies in one as an 
Intelligent Rough Neural Network System for efficient 
processing of medical data base is described in this 
section. The proposed hybrid system uses RST for pre- 
processing of data and ANN for classification or 
prediction. Some researchers have proposed and used 
similar integrated method in other applications [5], [6]. 
An algorithm developed for proposed hybrid system is 
given below and illustrated in Fig. 2. 



A. Algorithm for Intelligent Rough Neural Network 
System 

Algorithm: IRNNS 

Given: Medical data set. 

Objective: Obtain crisp set of influential parameters and 
construct suitable ANN architecture for prediction. 

// Pre-processing phase using RST.// 

Step 1. Discretize the data. 

Step 2. Construct the information system. 

Step 3. Select influential parameters in the form of 
reduct set by applying Reduct algorithm. 

Step 4. Check the selected parameters by considering 
biological importance. If satisfied go to Step 5 
for training ANN else go to Step 1 . 



where P is number of output processing elements, N is 
number of exemplars in the data set, yy is network 
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//ANN construction and training phase.// 

Step 5. Data set (/„, t„) n = 1,2, ... k, where input /„ and 
target t n . Split the data into three subsets as 
training, cross-validation and test sets. 

Step 6. Construct suitable ANN architecture. Structuring 
ANN with supervised back propagation learning 
algorithm includes following steps: 

1) Set all weights of units to random values 
ranging from -1 .0 to +1 .0. 

2) Set an input pattern to the neurons of the 
net's input layer. 

3) Activate each neuron of the following layer. 



Medical 
Data set 
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ANN Prediction System 



Applying 
rough sets 
algorithm 



Optimal 



Data 
Normalization 



Neural Network 
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Fig. 2. Overview of Hybrid IRNNS Model 
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Fig. 3. Different stages of Pre-processing with Rough Sets 
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(Multiply the weight values of the connections 
leading to the neuron with the output values of the 
proceeding neurons and add up these values. 
Pass the result to an activation function, which 
computing the output value of this neuron.) 

4) Repeat this until the output layer is reached. 

5) Compare the calculated output pattern to the 
desired target pattern and compute an error 
value. 

6) Change all weights of each weight matrix 
using the formula 

Weight (old) + learning rate * output error * output 
(neurons i) * output (neurons i+1) * (1 - output 
(neurons i+1)) 

7) Go to Step 2. 

8) The algorithm ends, if all output patterns 
match their target patterns. 

Step 7. Now, the constructed ANN is ready for prediction 
or classification. 

(The performance of this network was subsequently optimized 
by varying the number of nodes in the hidden layer and 
remove redundant nodes.) 

B. Pre-processing with Rough sets theory 

The rough sets based reduction of attribute space 
improves the efficiency of the predictor itself [13], [17]. 
The RST pre-processing model consists of two stage 
approaches, the first stage involves decision table 
reconstruction and the second stage involves the 
application of optimal reduct algorithm for data analysis. 
The different stages of data analysis using rough sets 
model is illustrated in Fig. 3. The algorithm used in 
IRNNS is described below. 

1) Quick Reduct Algorithm: The basic concept is that 
intersection of every items of discernibility matrix and 
reduct cannot be empty. The object of matrix i and j 
would be indiscernible to the reduct, if there are any 
empty intersections between items Cq with reduct, this 
contradicts the definition that reduct is the minimal 
attribute set discerning all objects. 

Let reduct set OptRed =<p. Sort the discernibility matrix 
\Cjj\ and examine every items of discernibility matrix Cg. If 
their intersection is empty, a shorter and frequent 
attribute |cj| is picked and inserted in OptRed and skip 
the entry otherwise. Attributes in shorter and frequent 



contribute more classification power to the reduct. If 
there is only one element in c,j, it must be a member of 
reduct. Repeat the procedure until all entries of 
discernibility matrix are examined. Finally, we get the 
optimal reduct in OptRed. 
Algorithm: Quick reduct algorithm 
Input: an information system (U, A u{cf}), 

where A = u a,, / = 1, . . .,n. 
Output: an optimal attribute set OptRed. 
Step 1 . OptRed = <p, freq(a,)= 0, for i=1,.. .n. 
Step 2. Generate discernibility matrix DisMat. 
Step 3. Count frequency of every attribute a, in DisMat; 

freq(a,) = freq(aj + n / |c| for every a,e\c\ 
Step 4. Merge and sort discernibility matrix DisMat. 
Step 5. For every object c, y in DisMat Do 

{ 

Step 6. if {Cjj n OptRed == ip ) then 
{ 

Step 7. Select attribute a, with maximal freq(a,) in 
OptRed. 

Step 8. OptRed = OptRed u {a}. 
} 

} 

Sfep 9. Return OptRed 

The quick reduct algorithm can be very useful for 
classifying unseen objects [18]. The idea of the 
algorithm is, taking frequency of attribute as heuristic. 
The technique is also applicable to optimal/approximate 
rule generation for they are also based on discernibility 
matrix. The middle- sized noisy dataset can be reduct 
by this algorithm, and can be used as input for ANN for 
further optimal classification / prediction. 

V. ILLUSTRATIVE EXPERIMENTS 

To illustrate the use of proposed hybrid method of 
data classification, let us consider an example of 
spermatalogical data set from the in-vitro fertilization 
(IVF) test outcomes for predicting bulls' semen fertility 
rate. The outcomes of the experiments are consulted 
with experts while selecting significant parameters using 
RST. 

A. Data Set 

The spermatological data used in the experiments are 
collected from Reproductive physiology laboratory in 
National Institute of Animal Nutriton and Physiology, 
Bangalore. The sperm functional parameters such as 
progressive forward motility, plasmalemma integrity, 
acrosomal integrity, sperm nuclear morphology and 
mitochondrial membrane potential were collected. The 
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percentage of observed cleavage rate was calculated by 
dividing the number of oocytes cleaved out of the total 
number of oocytes inseminated. 

6. Application of rough sets theory in semen evaluation 

Rough set theory is used for finding most effective 
minimal sperm functional attributes known as reduct set; 
those are effective in predicting cleavage rate or 
fertilization potential. When evaluating semen, the 
ultimate goal is to accurately predict its fertilizing 
potential [19]. 

The decision table, representing the spermatological 
data set, is constructed using eight condition attributes 
and one decision attribute of observed cleavage rate. In 
order to get better results, the data set is normalized by 
selecting maximum value and dividing all other values 
by the maximum value, the method generally used for 
normalizing input to neural network [20]. Since the new 
decision table contains discrete set of values, it does not 
require further discretization when considering 
indiscernibility relation. The next step is creating 
reducts, which are subset vectors of attributes that 
facilitate rule generation with minimal subsets. The 
proposed quick reduct algorithm is applied for creating 
minimal attribute set called reduct. The idea of the 
algorithm is taking frequency of attribute as heuristic, 
and it is worth to mention that applying reduction 
algorithm to get minimal subset of attributes is an NP- 
hard problem [21]. To calculate frequency of attributes, 
discernibility matrix is constructed and sorted. Every 
items of discernibility matrix C fl is examined and shorter 
and more frequent attribute {a 6 } is picked and assigned 
in OptRed. As known, attributes in shorter and frequent 
contribute more classification power to the reduct. The 
attribute {a 6 } is only one element, so it is a member of 
reduct as per algorithm. By repeating the procedure until 
all entries of discernibility matrix are examined, we get 
optimal reduct in OptRed (e.g. Table 2). The obtained 
optimal reduct set contains all the classification power of 
original decision table. All other possible reduct sets 
based on indiscernibility matrix are shown in Table 1. 

TABLE 1 

Possible Reduct Sets Based on Indiscernibility Matrix 



Reduct Sets 


Support 


Length 


{a 3 ,a 4 ,a 6 } 


100 


3 


{ar,a 3 ,a 8 } 


100 


3 


{ay,a 3 ,a 4 } 


100 


3 


{a3,a 6 ,a 8 } 


100 


3 



{a 4 ,a6,a 8 } 


100 


3 


{ai,a4,a8} 


100 


3 



The biological importances of the parameters are 
considered while obtained optimal reduct set. If 
obtained reduct sets are not satisfied considering their 
biological importance, control goes to step 1 of the 
IRNNS algorithm. The reduced / crisp data set is 
effective to train ANN. The experiments to determine 
the prediction accuracy of ANN is described in the 
remaining part of this section. 



TABLE 2 

Optimal Reduct Set Obtained by Applying Quick Reduct 
Algorithm and Considering Biological Importance 



Optimal Reduct Set 


Support 


Length 


{a?, 83,34, 36, a 8 } 


100 


5 



C. Network training and classification 

The constructed sample multi-layer perceptron (MLP) 
structured ANN is used for the prediction of animal 
semen fertility rate using obtained influential IVF 
parameters. The multi-layer perceptron (MLP), used to 
devise model of ANN, is illustrated in Fig. 4. 

Input Layer Hidden Layer 

■I ^-Jm\ Output Layer 







*C i1 








PFM 




►(J2 






HOS 












HOSG 
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>f\5 



CR 



\ ^A h "/ / 
Input set \ — / 

Slgmoidal synapses 

PFM - Progressive forward motility, HOS - Hyposomotic 
swelling test, SNM - Sperm nuclear morphology, MMP - 
Mitochondrial membrane potential, SZB - Sperm-zona 
binding, CR - Predicted Cleavage rate 



Fig. 4. The constructed sample multi-layer perceptron (MLP) for 
predicting semen fertility rate. 

Different processes involved in the optimization of 
ANN are: (1) selecting training and validation subsets, 
(2) analysing and transforming data, (3) selecting 
variables, (4) network construction and training, and (5) 
model verification. A properly trained ANN is capable of 
generalizing the information on the basis of the 
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knowledge acquired during the training phase and 
correctly infers the unseen part of population even if the 
sample data contain noisy information. To train ANN, a 
suitable training, validation and test sets are selected. 
In this work, the training, validation, and test sets are 
provided by the following parameters. 

• PFM - Progressive forward motility 

• HOS - Hypoosmotic swelling test 

• HOSG - Hypoosmotic swelling and Giemsa test 

• SNM - Sperm nuclear morphology 

• SZB - Sperm zonapellula bining 

The target set is composed by the (CR) observed 
cleavage rate. The target set corresponding to the 
training set is directly provided by recorded field fertility 
rate of animals. The input set values are pre-processed 
in order to guarantee that all training values will be 
converted into the range of possible outputs of the 
network, and so the network can be trained. Descriptive 
statistics for all quantitative input variables to train ANN 
is illustrated in Table 3. 

TABLE 3 

Descriptive Statistics for all Quantitative Input Variables to 
Train ANN 



Selected 
Parameters 


Mean ± S.E. 


Minimum 


Maximum 


PFM 


43.25 ± 2.69 


36.27 


49.31 


HOS 


39.58 ± 2.32 


31.85 


47.68 


HOS-G 


30.63 ± 4.56 


24.71 


39.39 


SNM 


70.14 ±7.5 


65.02 


74.66 


SZB 


88.27 ±3.18 


73.77 


107.09 


CR 


38.07 


8.63 


48.01 



The computer simulations of biological neuron layers 
of ANN are created. The MLP shown in Fig. 4., has the 
following characteristics: 

1 . input layer: 5 nodes as selected parameters for 
training are five; 

2. hidden layer: one hidden layer with 10 nodes (fixed 
after analysis); 




Number of nodes 

Fig. 5. Optimum number of nodes for the hidden layer 

3. output layer: output layer has one node as 
constructed neural network would be used to 
predict a fertility rate 

The performance of this network was subsequently 
optimized by varying the number of nodes in the hidden 
layer, the learning coefficient and the decrease factor of 
this coefficient and selecting the configuration with the 
highest predictive ability is illustrated in Fig. 5. Once 
trained, the network is ready to run validation and test 
set. The ANN validation phase in our experiment is 
shown in Fig. 6. Now, the ANN is trained well and ready 
for the prediction phase. 
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Fig. 6. Desired and actual output of ANN during validation test. 
D. Results 

The proposed hybrid prediction system is applied for 
pre-processing of medical database and to train ANN for 
making prediction. The prediction accuracy is observed 
by comparing observed and predicted cleavage rate 
(e.g. Fig. 7.). 
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Fig. 7. Prediction accuracy: comparison between observed and 
predicted cleavage rate. 

VI. Conclusions and Future Work 

The experimental results show that the proposed 
hybrid architecture is very efficient for medical data 
analysis in significantly lesser processing time. Since 
RST is a useful tool for incomplete or noisy data 
processing, proposed hybrid architecture is a promising 
and intuitively sound methodology for large or medium 
size medical data base with incomplete data. 

In addition, the results show that the hybridization of 
two machine learning techniques like ANN and RST is a 
promising alternative to the conventional methods of 
data analysis in this era of fast computers. The training 
time of the ANN with reduced sets of inputs is also quite 
naturally shorter and improves prediction accuracy. 

The RST is useful pre-processing tool for the input to 
ANN to improve classification and prediction. It is 
observed from the experiments that the hybridization of 
RST and ANN significantly improves the overall 
predictive ability of ANN. The proposed hybrid method 
is quite effective for classifying pattern from abundant 
and noisy data. The hybrid strategy is accepted as a 
valid approach to data mining, because no single 
method has enough capability to deal with various data 
mining settings. 

Future work involves incorporating biological 
information into the model. Another direction for the 
future work involves systematic comparison of different 
machine learning algorithms, hybridization of rough sets 
and neural network ensembles for building predictors to 
improve performance more. 
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